Kernel Bandwidth Selection for SVDD: Peak Criterion Approach for Large Data
نویسندگان
چکیده
Support vector data description (SVDD) provides a useful approach, with various practical applications, for constructing a description of multivariate data for single-class classification and outlier detection. The Gaussian kernel that is used in SVDD formulation allows a flexible data description defined by observations that are designated as support vectors. The data boundary of such a description is nonspherical and conforms to the geometric features of the data. By varying the Gaussian kernel bandwidth parameter, the SVDD-generated boundary can be made either smoother (more spherical, which might lead to underfitting), or tighter and more jagged (which might result in overfitting). Kakde et al. [1] proposed a peak criterion for selecting an optimal value of the kernel bandwidth to strike a balance between the data boundary smoothness and a method’s ability to capture the general geometric shape of the data. The peak criterion approach involves training the SVDD at various values of the kernel bandwidth parameter. When training data sets are large, the time required to obtain the optimal value of the Gaussian kernel bandwidth parameter according to the peak method can become prohibitively large. This paper proposes an extension of the peak method for the case of large data. The proposed method produces good results when applied to several data sets. Two existing alternative methods of computing the Gaussian kernel bandwidth parameter (coefficient of variation and distance to the farthest neighbor) were modified in order to allow comparison with the proposed method on convergence. Empirical comparison demonstrates the advantage of the proposed method.
منابع مشابه
The Mean and Median Criterion for Automatic Kernel Bandwidth Selection for Support Vector Data Description
Support vector data description (SVDD) is a popular technique for detecting anomalies. The SVDD classifier partitions the whole space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, and the Gaussian kernel is a common choice ...
متن کاملDetermination of optimal bandwidth in upscaling process of reservoir data using kernel function bandwidth
Upscaling based on the bandwidth of the kernel function is a flexible approach to upscale the data because the cells will be coarse-based on variability. The intensity of the coarsening of cells in this method can be controlled with bandwidth. In a smooth variability region, a large number of cells will be merged, and vice versa, they will remain fine with severe variability. Bandwidth variatio...
متن کاملA Bayesian approach to bandwidth selection for multivariate kernel density estimation
Kernel density estimation for multivariate data is an important technique that has a wide range of applications. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the increased difficulty in deriving an optimal data-driven bandwidth as the dimension of the data increases. ...
متن کاملBandwidth Selection for Multivariate Kernel Density Estimation Using MCMC
Kernel density estimation for multivariate data is an important technique that has a wide range of applications in econometrics and finance. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the increased difficulty in deriving an optimal datadriven bandwidth as the dimens...
متن کاملAutonomously Determining the Parameters for SVDD with RBF Kernel from a One-Class Training Set
The one-class support vector machine “support vector data description” (SVDD) is an ideal approach for anomaly or outlier detection. However, for the applicability of SVDD in real-world applications, the ease of use is crucial. The results of SVDD are massively determined by the choice of the regularisation parameter C and the kernel parameter σ of the widely used RBF kernel. While for two-clas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1611.00058 شماره
صفحات -
تاریخ انتشار 2016